Conversation
…nTicket records SSL_write can return SSL_ERROR_WANT_READ in TLS 1.3 when pending NewSessionTicket records need to be drained before application data can be sent. The old code treated any rc<=0 from SSL_write as fatal, causing spamc to silently fail to send the SPAMC protocol header and hang until spamd's 30-second timeout fired. Also fixes ssl_timeout_read's retry loop, which checked errno==EWOULDBLOCK instead of SSL_get_error()==SSL_ERROR_WANT_READ — the wrong error mechanism for OpenSSL. Changes: - spamc/libspamc.h: add SPAMC_DEBUG_SSL (1<<11) flag - spamc/spamc.c: add -D/--ssl-debug flag to trace SSL state to stderr - spamc/utils.c: fix ssl_timeout_read retry to use SSL_get_error() - spamc/libspamc.c: add _ssl_write_with_retry() helper that loops on WANT_READ/WANT_WRITE; add debug tracing after SSL_connect, around SSL_write, and on ssl_timeout_read failures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SSL_write(ssl, buf, 0) returns 0 with SSL_ERROR_SYSCALL, which the error-checking code treated as fatal. The non-SSL full_write path handles zero-length writes as a harmless no-op; match that behavior by skipping the SSL body write entirely when towrite_len == 0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grumpybozo
left a comment
There was a problem hiding this comment.
I have looked at the changes and they look good. I particularly like the idea of adding a debug option exposed to the command line.
I wonder if the debug option should be a more generic option rather then dedicated to ssl debugging. |
As mentioned in [either some back-and-forth with KAM (who wrote the original SSL support) and giovanni, and/or the tickets/notes here], the debugs were mainly added to try and figure out the actual SSL issues I was debugging, since at that point you can't easily debug it with wireshark (encrypted), and a bunch of opaque tls-encoded strings in a debugger are....bad, especially since it requires a build with debug symbols, which most users in the field reporting an issue won't have. (In spamd, you'd just stick a couple of warn's in and be done with it, but since spamc requires a recompile...yeah, annoying). If you can think of other places we'd like to add debugs, I'll happily add them, either to a different -Option or just make them spit out with the SSL debugs. I'd argue that since this push was related specifically to ssl, it makes sense to not have this feature a full debug-featureset insert. |
(Sorry about that, decided to create a new branch on my side for the spamc fixes, whereas my original PR was against my trunk branch).
Spamc has several bugs that cause TLS1.3 to not work, which is the default version that will be negotiated (spamd listens on TLS1.3 because it just does whatever IO:Socket:SSL supports).
I (not the LLM) diagnosed this by finding random disconnects when trying to do spamc -K -S connecting to an SSL-enabled spamd. Spamd would only log:
Apr 2 21:03:17 post spamd[89898]: prefork: child states: II
Apr 2 21:03:18 post spamc[89911]: SSL write failed (5)
After adding a bunch of debug prints to spamd, and forcing spamd to not do TLS1.3, I attempting connect with openssl s_client, which worked, but spamc did not.
The three bugs present in the current code are:
Bug 1: ssl_timeout_read retry loop checks wrong error mechanism spamc/utils.c — retry loop checked errno == EWOULDBLOCK instead of SSL_get_error() == SSL_ERROR_WANT_READ. OpenSSL uses its own error queue, not errno, so the retry never fired.
Bug 2: SSL_write not retried on SSL_ERROR_WANT_READ spamc/libspamc.c — In TLS 1.3, the server sends post-handshake NewSessionTicket records after the handshake completes. SSL_write can return SSL_ERROR_WANT_READ while these are pending. The original code treated any rc <= 0 from SSL_write as a fatal error with no retry.
Bug 3: SSL_write(ssl, buf, 0) treated as fatal error spamc/libspamc.c — For commands with no body (e.g. PING / -K), towrite_len == 0. Calling SSL_write with length 0 returns 0, which the rc <= 0 check treated as failure. The non-SSL full_write path handles zero-length writes as a no-op.
This code also adds a -D argument to spamc so that future SSL connect issues may be debugged (not recommended for normal use), because doing so with truss/strace is painful.
Tested via both:
spamc/spamc -S -D -l -d localhost -p 784 < t/data/spam/001 (actual message test)
spamc/spamc -S -D -l -K -d localhost -p 784 (send a test ping)